Skip to content

GH-45523: [R] Implement Utf8View type bindings#49712

Draft
thisisnic wants to merge 9 commits intoapache:mainfrom
thisisnic:GH-45523-ipc-polars
Draft

GH-45523: [R] Implement Utf8View type bindings#49712
thisisnic wants to merge 9 commits intoapache:mainfrom
thisisnic:GH-45523-ipc-polars

Conversation

@thisisnic
Copy link
Copy Markdown
Member

@thisisnic thisisnic commented Apr 11, 2026

Rationale for this change

No bindings for Utf8View type in the R package

What changes are included in this PR?

Implement bindings

Are these changes tested?

Yep

Are there any user-facing changes?

Yep, adding functionality.

AI Usage

Heavily used Codex/Claude here. I'm not confident of every line of code. I read things over, and iterated on it making sure that tests pass and nothing seemed wildly incorrect.

@thisisnic
Copy link
Copy Markdown
Member Author

We should rebase once #49710 is merged as changes from that PR are in this branch as they were needed to make it work.

@thisisnic thisisnic force-pushed the GH-45523-ipc-polars branch from e8768d0 to bb711a7 Compare May 5, 2026 16:41
Comment thread r/src/array_to_vector.cpp
Comment on lines +627 to +630
cpp11::stop(
"Cannot convert Dictionary Array of type `%s` to R: dictionary has "
"more levels than an R factor can represent",
dict_type.ToString().c_str());
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test for this

@github-actions github-actions Bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels May 5, 2026
Comment on lines +829 to +830
return this->value_builder_->Append(
std::string_view(view_.bytes, static_cast<size_t>(view_.size)));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code passed (const char*, int32_t) which matches StringBuilder::Append but not StringViewBuilder::Append (which takes int64_t). Switching to std::string_view works for both builder types.

Comment thread r/src/r_to_arrow.cpp
Comment on lines +913 to +954
template <typename T>
class RPrimitiveConverter<T, enable_if_string_view<T>>
: public PrimitiveConverter<T, RConverter> {
public:
Status Extend(SEXP x, int64_t size, int64_t offset = 0) override {
RVectorType rtype = GetVectorType(x);
if (rtype != STRING) {
return Status::Invalid("Expecting a character vector");
}
return UnsafeAppendUtf8Strings(arrow::r::utf8_strings(x), size, offset);
}

void DelayedExtend(SEXP values, int64_t size, RTasks& tasks) override {
auto task = [this, values, size]() { return this->Extend(values, size); };
tasks.Append(false, std::move(task));
}

private:
Status UnsafeAppendUtf8Strings(const cpp11::strings& s, int64_t size, int64_t offset) {
RETURN_NOT_OK(this->primitive_builder_->Reserve(size - offset));
const SEXP* p_strings = reinterpret_cast<const SEXP*>(DATAPTR_RO(s)) + offset;

int64_t total_length = 0;
for (R_xlen_t i = offset; i < size; i++, ++p_strings) {
SEXP si = *p_strings;
total_length += si == NA_STRING ? 0 : LENGTH(si);
}
RETURN_NOT_OK(this->primitive_builder_->ReserveData(total_length));

p_strings = reinterpret_cast<const SEXP*>(DATAPTR_RO(s)) + offset;
for (R_xlen_t i = offset; i < size; i++, ++p_strings) {
SEXP si = *p_strings;
if (si == NA_STRING) {
this->primitive_builder_->UnsafeAppendNull();
} else {
this->primitive_builder_->UnsafeAppend(CHAR(si), LENGTH(si));
}
}

return Status::OK();
}
};
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less confident about this entire block of code

@thisisnic
Copy link
Copy Markdown
Member Author

A lot of .Rd updates as I used a newer roxygen2 version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant